Co-tuning of Software Specializers and Hardware Accelerators within a CNN Application
نویسنده
چکیده
Software specializers and hardware accelerators share the common goal of decreasing the runtime of an operation while being parameterizable and abstracting away underlying optimizations from users. The competition for reconfigurable hardware resources among candidate hardware accelerators means that tuning must take place at an application level and not at an operation level as is the case for software specializers. This paper presents a methodology for the co-tuning of software specializers and hardware accelerators so that both may be simultaneously used in applications. To explore the validity of this approach, experiments were carried with software specialized and hardware accelerated 2D stencils performing convolutions for trial convolutional neural networks. The results demonstrate that an application level co-tuner can discover which operations are best suited for software specializers and which merit the limited reconfigurable hardware resources required for hardware acceleration.
منابع مشابه
Synergy: A HW/SW Framework for High Throughput CNNs on Embedded Heterogeneous SoC
Convolutional Neural Networks (CNN) have been widely deployed in diverse application domains. There has been significant progress in accelerating both their training and inference using high-performance GPUs, FPGAs, and custom ASICs for datacenter-scale environments. The recent proliferation of mobile and IoT devices have necessitated real-time, energy-efficient deep neural network inference on...
متن کاملLow Complexity Multiply-Accumulate Units for Convolutional Neural Networks with Weight-Sharing
Convolutional neural networks (CNNs) are one of the most successful machine learning techniques for image, voice and video processing. CNNs require large amounts of processing capacity and memory bandwidth. Hardware accelerators have been proposed for CNNs which typically contain large numbers of multiplyaccumulate (MAC) units, the multipliers of which are large in integrated circuit (IC) gate ...
متن کاملCompilation and Parallelization Techniques with Tool Support to Realize Sequence Alignment Algorithm on FPGA and Multicore
Reconfigurable computing (RC), such as computing using field programmable gate array (FPGA) technology has been shown as the field to accelerate a large variety of applications. RC fills the gap between hardware and software, achieving high performance on the hardware than the software and at the same time maintaining a remarkable amount of flexibility. Though there are bottlenecks associated w...
متن کاملTowards Computational Efficiency of Next Generation Multimedia Systems
High throughput demands under complexityand power-efficiency has imposed numerous design challenges for the next generation multimedia systems. Multimedia (especially video) applications impose tight throughput constraints (e.g., frame resolutions beyond 1920×1080, at more than 30 FPS), which must be met by possibly resourceand battery-constrained underlying hardware. However, technology scalin...
متن کاملEnabling Inter-Machine Parallelism in High-Level Languages with SEJITS and MapReduce
Selective, embedded, just-in-time specialization (SEJITS) is a technique for optimizing embedded domain-specific languages through the use of specializers, or code modules developed by expert programmers that target particular accelerators such as multicore processors and GPUs via justin-time compilation. We extend SEJITS to exploit intermachine parallelism by targeting clusters of machines via...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2016